HDBSCAN clustering for identification of states with lignin dimers bound to \beta-cyclodextrin from molecular dynamics simulations
Author
Brian Novak
Introduction
In our previous study1, we investigated the interaction between \beta-cyclodextrin2 and lignin dimer derivatives through experimental methods, molecular dynamics simulations, and docking. The relevant chemical structures are depicted in Figure 1. An introduction to the project can be found here, and a summary of the publication is available here. During the molecular dynamics simulations, we observed multiple types of dimer-cyclodextrin bound states. We estimated the proportions of these states in unbiased simulations using the following procedure:
Computed a large number of collective variables including angle between lignin dimer and \beta-cyclodextrin principal axes, distances between atoms in the \beta-cyclodextrin molecule to atoms in the lignin dimer molecule, and lignin dihedral angles
Clustered the trajectory configurations using DBSCAN clustering to separate the different states into distinct clusters
Counted the number of points in each cluster and computed proportions of each one
The procedure details can be found in the Supporting Information on pages 9-18. The primary challenge was encountered in step 3, where the states were not distinctly separated. Trial and error was necessary to select DBSCAN parameters that successfully differentiated the clusters.
Here, only three collective variables were carefully chosen to separate the observed states. Therefore, dimensionality reduction was not required. Additionally, HDBSCAN was used instead of DBSCAN for clustering. Finding the point where the number of clusters as a function of the min_samples = min_cluster_size parameter for HDBSCAN started to never increase provided a way to identify a “natural” number of clusters (see Determination of min_samples for details on the algorithm). The final clusters were well separated. The analysis revealed states that were not apparent based on just visual inspection of the configurations in the trajectories.
Figure 1: Structures of lignin dimer derivatives (1-3), and \beta-cyclodextrin. The lower right structures are 3D top and side views of \beta-cyclodextrin. The 3D representations show \beta-cyclodextrin from top and side views, with hydrogen atoms in white, carbon atoms in cyan, and oxygen atoms in red. The face of \beta-cyclodextrin with two hydroxyl (-OH) groups per unit (seen in the top view) is referred to as the secondary face, while the other face is referred to as the primary face. rdkit 2023.03.3 was used for the 2D representations and VMD3 1.9.3 was used to create the 3D representations.
Code
"""The files related to this figure are in ./000_Attachments/fig-dimers_BCD_labelled.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""from pathlib import Pathimport subprocessimport numpy as npfrom rdkit import Chemfrom rdkit.Chem import Drawimport warnings# Suppress UserWarningwarnings.filterwarnings("ignore", category=UserWarning)# Generate dimer structuressmi_dir ="./000_Attachments/fig-dimers_BCD_labelled"smi_list = [Path(smi_dir, smi) for smi in ['GG.smi', 'TGG.smi', 'GGBB.smi']]rdkit_list = [Chem.MolFromSmiles(np.loadtxt(smi, dtype=str)[0]) for smi in smi_list]dimers_img = Draw.MolsToGridImage(rdkit_list, molsPerRow=3, subImgSize=(400, 200))outfile = Path(smi_dir, "dimers.png")withopen(outfile, "wb") as png: png.write(dimers_img.data)subprocess.run(f"mogrify -trim {outfile}", shell=True) # ImageMagick# Generate cyclodextrin 2D structuresmi = Path(smi_dir, "cyclodextrin.smi")rdkit = Chem.MolFromSmiles(np.loadtxt(smi, dtype=str)[0])outfile = Path(smi_dir, "BCD.png")img = Draw.MolToFile(rdkit, outfile, size=(500, 500))subprocess.run(f"mogrify -trim {outfile}", shell=True)# A cyclodextrin PDB file, BCD.pdb, is included.# VMD structures were created manually (top.tga, side.tga).# Structures were combined into one image (dimers_BCD.png) manually# Label the image with all structuresscript_file = Path(smi_dir, "structures_labelling.sh")subprocess.run(f"bash {script_file}{smi_dir}", shell=True);
Methods
Collective variables
Normal distance
Three collective variables were defined based on important configurations seen in unbiased simulations. The first was a signed distance from the cyclodextrin to the lignin dimer along the direction of a vector pointing from the cyclodextrin center through the secondary face.
\overrightarrow{CL} is the vector pointing from the cyclodextrin center of mass to the lignin center of mass and \overrightarrow{CS} is the vector pointing from the cyclodextrin center of mass to the center of the hydroxyl oxygen atoms in the cyclodextrin secondary face.
Code
"""The files related to this figure are in ./000_Attachments/fig-CS_vector.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import MDAnalysis as mdaimport py3Dmol# PDB file namebcd_pdb ="000_Attachments/fig-CS_vector/BCD.pdb"# Load the PDB fileu = mda.Universe(bcd_pdb)# Select the atoms in the cyclodextrincyclodextrin = u.select_atoms("resname BCDC")# Calculate the center of mass of the cyclodextrincom_cyclodextrin = cyclodextrin.center_of_mass()# Select the oxygen atoms in the secondary face (O2 and O3)oxygen_atoms = cyclodextrin.select_atoms("name O2 O3")# Calculate the center of mass of the oxygen atomscom_oxygen = oxygen_atoms.center_of_mass()# Calculate the CS vectorcs_vector = com_oxygen - com_cyclodextrin# Create a py3Dmol viewview = py3Dmol.view(width=800, height=600)# Add the cyclodextrin molecule in CPK representationview.addModel(open(bcd_pdb, "r").read(), "pdb")view.setStyle({"model": 0}, {"sphere": {"scale": 0.3}, "stick": {}})# Add the CS vectorview.addArrow( {"start": {"x": com_cyclodextrin[0], "y": com_cyclodextrin[1], "z": com_cyclodextrin[2]},"end": {"x": com_oxygen[0], "y": com_oxygen[1], "z": com_oxygen[2]},"radius": 0.2,"color": "green", })# Label the CS vector with an offsetlabel_offset = [0.3, 0, 0]view.addLabel("CS", {"position": {"x": com_cyclodextrin[0] + cs_vector[0] + label_offset[0],"y": com_cyclodextrin[1] + cs_vector[1] + label_offset[1],"z": com_cyclodextrin[2] + cs_vector[2] + label_offset[2], },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Set the background color and zoom to fit the moleculeview.setBackgroundColor("white")view.zoomTo()# Rotate the view to show the CS vectorview.rotate(-70, "x")# Show the viewview.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
Figure 2: \overrightarrow{CS} represented as a green arrow. In the cyclodextrin structure, gray atoms are carbon and red atoms are oxygen. Hydrogen atoms are not shown.
Code
"""The files related to this figure are in ./000_Attachments/fig-CL_vector.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import numpy as npimport MDAnalysis as mdaimport py3Dmol# PDB file namesbcd_pdb ="000_Attachments/fig-CL_vector/BCD.pdb"lignin_pdb ="000_Attachments/fig-CL_vector/GG.pdb"# Load the PDB file for ligninu_lignin = mda.Universe(lignin_pdb)# Select the lignin moleculelignin = u_lignin.select_atoms("all")# Calculate the center of mass of the lignin moleculecom_lignin = lignin.center_of_mass()# Load the PDB file for cyclodextrinu_cyclodextrin = mda.Universe(bcd_pdb)# Select the cyclodextrin moleculecyclodextrin = u_cyclodextrin.select_atoms("all")# Calculate the center of mass of the cyclodextrin moleculecom_cyclodextrin = cyclodextrin.center_of_mass()# Calculate the CL vectorcl_vector = com_lignin - com_cyclodextrin# Recalculate the CL vectorcom_lignin = lignin.center_of_mass()cl_vector = com_lignin - com_cyclodextrin# Create a py3Dmol viewview = py3Dmol.view(width=800, height=600)# Add the cyclodextrin molecule in CPK representationview.addModel(open(bcd_pdb, "r").read(), "pdb")view.setStyle({"model": 0}, {"sphere": {"scale": 0.3}, "stick": {}})# Add the lignin molecule in CPK representationview.addModel(open(lignin_pdb, "r").read(), "pdb")view.setStyle({"model": 1}, {"sphere": {"scale": 0.3}, "stick": {}})# Add the CL vectorview.addArrow( {"start": {"x": com_cyclodextrin[0], "y": com_cyclodextrin[1], "z": com_cyclodextrin[2]},"end": {"x": com_lignin[0], "y": com_lignin[1], "z": com_lignin[2]},"radius": 0.2,"color": "green", })# Label the CL vectorlabel_offset = [-1.2, 0.4, 0]view.addLabel("CL", {"position": {"x": com_cyclodextrin[0] + label_offset[0],"y": com_cyclodextrin[1] + label_offset[1],"z": com_cyclodextrin[2] + label_offset[2], },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Label the cyclodextrin moleculelabel_offset = [0, 8.8, 0]view.addLabel("cyclodextrin", {"position": {"x": com_cyclodextrin[0] + label_offset[0],"y": com_cyclodextrin[1] + label_offset[1],"z": com_cyclodextrin[2] + label_offset[2], },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Label the lignin moleculelabel_offset = [0.5, 5, 0]view.addLabel("lignin dimer 1", {"position": {"x": com_lignin[0] + label_offset[0],"y": com_lignin[1] + label_offset[1],"z": com_lignin[2] + label_offset[2], },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Set the background color and zoom to fit the modelsview.setBackgroundColor("white")view.zoomTo()# Zoom in a bitview.zoom(1.4)# Show the viewview.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
Figure 3: \overrightarrow{CL} represented as a green arrow. In the molecule structures, gray atoms are carbon and red atoms are oxygen. Hydrogen atoms are not shown.
Code
"""The files related to this figure are in ./000_Attachments/fig-dnorm.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import py3Dmolimport numpy as npimport MDAnalysis as mda# Load PDB files with mdanalysiscyclodextrin_pdb ="000_Attachments/fig-dnorm/BCD.pdb"lignin_pdb ="000_Attachments/fig-dnorm/GG.pdb"cyclodextrin = mda.Universe(cyclodextrin_pdb)lignin = mda.Universe(lignin_pdb)# Calculate centers of masscyclodextrin_com = cyclodextrin.atoms.center_of_mass()oxygen_com = cyclodextrin.select_atoms("name O2 O3").center_of_mass()lignin_com = lignin.atoms.center_of_mass()# Calculate vectorsCS_vector = oxygen_com - cyclodextrin_comCL_vector = lignin_com - cyclodextrin_com# Calculate dnormdnorm = np.dot(CS_vector, CL_vector) / np.linalg.norm(CS_vector)# Create py3Dmol visualizationview = py3Dmol.view(width=800, height=600)# Add the cyclodextrin molecule in CPK representation. Make it translucent.view.addModel(open(cyclodextrin_pdb, "r").read(), "pdb")view.setStyle({"model": 0}, {"sphere": {"scale": 0.3, "opacity": 0.5}, "stick": {"opacity": 0.5}})# Add the lignin molecule in CPK representationview.addModel(open(lignin_pdb, "r").read(), "pdb")view.setStyle({"model": 1}, {"sphere": {"scale": 0.3}, "stick": {}})# Add green cylinder to represent dnormstart = cyclodextrin_comend = cyclodextrin_com + dnorm * (CS_vector / np.linalg.norm(CS_vector))view.addCylinder( {"start": {"x": start[0], "y": start[1], "z": start[2]},"end": {"x": end[0], "y": end[1], "z": end[2]},"radius": 0.2,"color": "green", })# Label dnorm at the center of the cylinder with an offset.# Use a white background, black text, and a font size of 20.cylinder_center = (start + end) /2label_offset = [0.25, 0, 0]view.addLabel("dₙ", {"position": {"x": cylinder_center[0] + label_offset[0],"y": cylinder_center[1] + label_offset[1],"z": cylinder_center[2] + label_offset[2], },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Label the cyclodextrin moleculelabel_offset = [0, 0, 3.8]view.addLabel("cyclodextrin", {"position": {"x": cyclodextrin_com[0] + label_offset[0],"y": cyclodextrin_com[1] + label_offset[1],"z": cyclodextrin_com[2] + label_offset[2], },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Label the lignin moleculelabel_offset = [-1.65, 0, 3.9]view.addLabel("lignin dimer 1", {"position": {"x": lignin_com[0] + label_offset[0],"y": lignin_com[1] + label_offset[1],"z": lignin_com[2] + label_offset[2], },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Set the background color and zoom to fit the modelsview.setBackgroundColor("white")view.zoomTo()# Rotate the view to see the dnorm vectorview.rotate(90, "x")# Zoom in a bitview.zoom(1.1)view.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
Figure 4: The signed distance from the cyclodextrin to the lignin dimer along \overrightarrow{CS}, d_n, shown as a green cylinder. In the molecule structures, gray atoms are carbon and red atoms are oxygen. Cyclodextrin is translucent. Hydrogen atoms are not shown.
Tangential distance
The second collective variable was the distance from the cyclodextrin center of mass to the lignin center of mass in the direction perpendicular to \overrightarrow{CS}. It is the length of the second leg of a right triangle with hypotenuse |\overrightarrow{CL}| and one leg d_n.
"""The files related to this figure are in ./000_Attachments/fig-dtang.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import MDAnalysis as mdaimport numpy as npimport py3Dmol# PDB file pathscyclodextrin_pdb ="000_Attachments/fig-dtang/BCD.pdb"lignin_pdb ="000_Attachments/fig-dtang/GG.pdb"# Load the PDB filesu_cyclodextrin = mda.Universe(cyclodextrin_pdb)u_lignin = mda.Universe(lignin_pdb)# Calculate the center of mass for cyclodextrin and lignincom_cyclodextrin = u_cyclodextrin.atoms.center_of_mass()com_lignin = u_lignin.atoms.center_of_mass()# Calculate the center of mass for oxygen atoms in cyclodextrin secondary faceoxygen_atoms = u_cyclodextrin.select_atoms("name O2 O3")com_oxygen = oxygen_atoms.center_of_mass()# Calculate the vectorsCS_vector = com_oxygen - com_cyclodextrinCL_vector = com_lignin - com_cyclodextrin# Calculate the projection of CL_vector onto the plane perpendicular to CS_vectorCS_unit_vector = CS_vector / np.linalg.norm(CS_vector)projection = np.dot(CL_vector, CS_unit_vector) * CS_unit_vectordtang_vector = CL_vector - projectiondtang = np.linalg.norm(dtang_vector)# Create the py3Dmol visualizationview = py3Dmol.view()view.addModel(open(cyclodextrin_pdb, "r").read(), "pdb")view.addModel(open(lignin_pdb, "r").read(), "pdb")# Set CPK representation for both molecules. Make the cyclodextrin translucent.view.setStyle({"model": 0}, {"sphere": {"scale": 0.3, "opacity": 0.5}, "stick": {"opacity": 0.5}})view.setStyle({"model": 1}, {"sphere": {"scale": 0.3}, "stick": {}})# Add a green cylinder to represent dtangstart = com_cyclodextrinend = com_cyclodextrin + dtang_vectorview.addCylinder( {"start": {"x": float(start[0]), "y": float(start[1]), "z": float(start[2])},"end": {"x": float(end[0]), "y": float(end[1]), "z": float(end[2])},"radius": 0.2,"color": "green", })# Label dtanglabel_offset = [-5, 0, 0.3]view.addLabel("dₜ", {"position": {"x": float(end[0] + label_offset[0]),"y": float(end[1] + label_offset[1]),"z": float(end[2] + label_offset[2]), },"backgroundColor": "white","fontColor": "black","fontSize": 20, },)# Label cyclodextrinlabel_offset = [-4, 0, 3.8]view.addLabel("cyclodextrin", {"position": {"x": float(com_cyclodextrin[0] + label_offset[0]),"y": float(com_cyclodextrin[1] + label_offset[1]),"z": float(com_cyclodextrin[2] + label_offset[2]), },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Label ligninlabel_offset = [-1.7, 0, 3.8]view.addLabel("lignin dimer 1", {"position": {"x": float(com_lignin[0] + label_offset[0]),"y": float(com_lignin[1] + label_offset[1]),"z": float(com_lignin[2] + label_offset[2]), },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Set the background color and zoom to fit the modelsview.setBackgroundColor("white")view.zoomTo()# Rotate the view to show dtangview.rotate(90, "x")view.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
Figure 5: The distance from the cyclodextrin center of mass to the lignin center of mass in the direction perpendicular to \overrightarrow{CS}, d_t, represented as a green cylinder. In the molecule structures, gray atoms are carbon and red atoms are oxygen. Cyclodextrin is translucent. Hydrogen atoms are not shown.
Relative orientation
The third collective variable was the cosine of the angle between the vector from the center of the ring atoms in the lignin head to the center of the ring atoms in the lignin tail (\overrightarrow{HT}) and \overrightarrow{CS}.
"""The files related to this figure are in ./000_Attachments/fig-HT_vector.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import py3Dmolimport numpy as npimport MDAnalysis as mda# Load PDB file with MDAnalysislignin_pdb ="000_Attachments/fig-HT_vector/GG.pdb"lignin = mda.Universe(lignin_pdb)# Lignin head and tail positions for HT vectorlignin_head = lignin.select_atoms("resid 1 and name C1 C2 C3 C4 C5 C6").center_of_mass()lignin_tail = lignin.select_atoms("resid 2 and name C1 C2 C3 C4 C5 C6").center_of_mass()# Create py3Dmol visualizationview = py3Dmol.view(width=800, height=600)# Add the lignin molecule in CPK representationview.addModel(open(lignin_pdb, "r").read(), "pdb")view.setStyle({"model": 0}, {"sphere": {"scale": 0.3, "opacity": 0.5}, "stick": {"opacity": 0.5}})# Add green arrow to represent HT vectorview.addArrow( {"start": {"x": float(lignin_head[0]),"y": float(lignin_head[1]),"z": float(lignin_head[2]), },"end": {"x": float(lignin_tail[0]),"y": float(lignin_tail[1]),"z": float(lignin_tail[2]), },"radius": 0.2,"color": "green", })# Label HTlabel_offset = [1, 0, 0]view.addLabel("HT", {"position": {"x": float(lignin_tail[0] + label_offset[0]),"y": float(lignin_tail[1] + label_offset[1]),"z": float(lignin_tail[2] + label_offset[2]), },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Label ligninlignin_com = lignin.atoms.center_of_mass()label_offset = [-3, 0, 6]view.addLabel("lignin dimer 1", {"position": {"x": float(lignin_com[0] + label_offset[0]),"y": float(lignin_com[1] + label_offset[1]),"z": float(lignin_com[2] + label_offset[2]), },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Set the background color and zoom to fit the modelsview.setBackgroundColor("white")view.zoomTo()# Rotate for better view of the HT vectorview.rotate(-90, "x")view.rotate(-30, "z")# Show the viewview.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
Figure 6: The lignin head to tail vector, \overrightarrow{HT}, represented as a green arrow. In the translucent lignin structure, gray atoms are carbon and red atoms are oxygen. Hydrogen atoms are not shown.
Code
"""The files related to this figure are in ./000_Attachments/fig-theta.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import py3Dmolimport numpy as npimport MDAnalysis as mda# Load PDB files with MDAnalysiscyclodextrin_pdb ="000_Attachments/fig-theta/BCD.pdb"lignin_pdb ="000_Attachments/fig-theta/GG.pdb"cyclodextrin = mda.Universe(cyclodextrin_pdb)lignin = mda.Universe(lignin_pdb)# Calculate centers of masscyclodextrin_com = cyclodextrin.atoms.center_of_mass()oxygen_com = cyclodextrin.select_atoms("name O2 O3").center_of_mass()# Calculate CS vectorCS_vector = oxygen_com - cyclodextrin_com# Calculate HT vectorlignin_head = lignin.select_atoms("resid 1 and name C1 C2 C3 C4 C5 C6").center_of_mass()lignin_tail = lignin.select_atoms("resid 2 and name C1 C2 C3 C4 C5 C6").center_of_mass()HT_vector = lignin_head - lignin_tail# Shift HT vector to originate at cyclodextrin center of massHT_vector_shifted = HT_vector + cyclodextrin_com# Calculate the cosine of the angle θcos_theta = np.dot(HT_vector, CS_vector) / (np.linalg.norm(HT_vector) * np.linalg.norm(CS_vector))theta = np.arccos(cos_theta)# Create py3Dmol visualizationview = py3Dmol.view(width=800, height=600)# Add the cyclodextrin molecule in CPK representationview.addModel(open(cyclodextrin_pdb, "r").read(), "pdb")view.setStyle({"model": 0}, {"sphere": {"scale": 0.3, "opacity": 0.5}, "stick": {"opacity": 0.5}})# Add the lignin molecule in CPK representationview.addModel(open(lignin_pdb, "r").read(), "pdb")view.setStyle({"model": 1}, {"sphere": {"scale": 0.3}, "stick": {}})# Add green arrows to represent HT and CS vectorsview.addArrow( {"start": {"x": float(cyclodextrin_com[0]),"y": float(cyclodextrin_com[1]),"z": float(cyclodextrin_com[2]), },"end": {"x": float(HT_vector_shifted[0]),"y": float(HT_vector_shifted[1]),"z": float(HT_vector_shifted[2]), },"radius": 0.2,"color": "green", })view.addArrow( {"start": {"x": float(cyclodextrin_com[0]),"y": float(cyclodextrin_com[1]),"z": float(cyclodextrin_com[2]), },"end": {"x": float(oxygen_com[0]), "y": float(oxygen_com[1]), "z": float(oxygen_com[2])},"radius": 0.2,"color": "green", })# Label HT, CS, and θlabel_offset = [0, 0, 1]view.addLabel("HT", {"position": {"x": float(HT_vector_shifted[0] + label_offset[0]),"y": float(HT_vector_shifted[1] + label_offset[1]),"z": float(HT_vector_shifted[2] + label_offset[2]), },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)label_offset = [-3, 0, 1]view.addLabel("CS", {"position": {"x": float(oxygen_com[0] + label_offset[0]),"y": float(oxygen_com[1] + label_offset[1]),"z": float(oxygen_com[2] + label_offset[2]), },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)label_offset = [0, 0, 1.6]view.addLabel("θ", {"position": {"x": float(cyclodextrin_com[0] + label_offset[0]),"y": float(cyclodextrin_com[1] + label_offset[1]),"z": float(cyclodextrin_com[2] + label_offset[2]), },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Label cyclodextrinlabel_offset = [-4, 0, 5.8]view.addLabel("cyclodextrin", {"position": {"x": float(cyclodextrin_com[0] + label_offset[0]),"y": float(cyclodextrin_com[1] + label_offset[1]),"z": float(cyclodextrin_com[2] + label_offset[2]), },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Label ligninlignin_com = lignin.atoms.center_of_mass()label_offset = [-9, 0, -1]view.addLabel("lignin dimer 1", {"position": {"x": float(lignin_com[0] + label_offset[0]),"y": float(lignin_com[1] + label_offset[1]),"z": float(lignin_com[2] + label_offset[2]), },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Set the background color and zoom to fit the modelsview.setBackgroundColor("white")view.zoomTo()# Rotate for better view of θview.rotate(-90, "x")view.rotate(-30, "z")# Show the viewview.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
Figure 7: The angle \theta between \overrightarrow{HT} and \overrightarrow{CS} represented as a green arrow. The origin of \overrightarrow{HT} is shifted to the cyclodextrin center of mass. In the molecule structures, gray atoms are carbon and red atoms are oxygen. Cyclodextrin is translucent. Hydrogen atoms are not shown.
The collective variables were computed using PLUMED4 2.6.6.
HDBSCAN clustering
Determination of min_samples
The best value of min_samples = min_cluster_size used in the HDBSCAN algorithm was determined using the following algorithm:
Find the spike positions where the number of clusters as a function of min_samples changes, stays at the new value for 3 or less points, then changes back.
If the spike with the largest value of min_samples is upward, then the best value of min_samples is the value immediately following the value corresponding to the spike. If the spike with the largest value of min_samples is downward, the best value of min_samples is found at the index corresponding to the next occurrence of the number of clusters equal to the number of clusters at the spike.
Results
Number of clusters
The numbers of clusters as a function of min_samples are shown in Figure 8, Figure 9, and Figure 10. The best values for min_samples and the corresponding numbers of clusters were determined using the algorithm described in Determination of min_samples.
Figure 8: Number of clusters for dimer 1 as a function of min_samples for the HDBSCAN algorithm. The best value for min_samples and the corresponding number of clusters is circled in red and indicated in the legend.
Code
"""The files related to this figure are in ./000_Attachments/fig-nclusters_GG.A singularity image files is ...This figure was generated using snakemake in the input directory.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import subprocessimport osfrom pathlib import Path# Current directorycurrent_dir = Path().resolve()# Change directoryinput_dir = Path("../../input").resolve()os.chdir(input_dir)# Run the snakemake commandsubprocess.run( ["snakemake","-s","plumed.smk","../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-nclusters_GG/min_samples.png","-j", ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,)# Change directory backos.chdir(current_dir)
Figure 9: Number of clusters for dimer 2 as a function of min_samples for the HDBSCAN algorithm. The best value for min_samples and the corresponding number of clusters is circled in red and indicated in the legend.
Code
"""The files related to this figure are in ./000_Attachments/fig-nclusters_TGG.A singularity image files is ...This figure was generated using snakemake in the input directory.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import subprocessimport osfrom pathlib import Path# Current directorycurrent_dir = Path().resolve()# Change directoryinput_dir = Path("../../input").resolve()os.chdir(input_dir)# Run the snakemake commandsubprocess.run( ["snakemake","-s","plumed.smk","../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-nclusters_TGG/min_samples.png","-j", ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,)# Change directory backos.chdir(current_dir)
Figure 10: Number of clusters for dimer 3 as a function of min_samples for the HDBSCAN algorithm. The best value for min_samples and the corresponding number of clusters is circled in red and indicated in the legend.
Code
"""The files related to this figure are in ./000_Attachments/fig-nclusters_GG_BB.A singularity image files is ...This figure was generated using snakemake in the input directory.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import subprocessimport osfrom pathlib import Path# Current directorycurrent_dir = Path().resolve()# Change directoryinput_dir = Path("../../input").resolve()os.chdir(input_dir)# Run the snakemake commandsubprocess.run( ["snakemake","-s","plumed.smk","../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-nclusters_GG_BB/min_samples.png","-j", ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,)# Change directory backos.chdir(current_dir)
Cluster scatter plots
The 3D scatter plots of the collective variables showing the HDBSCAN clusters for each dimer are shown in Figure 11, Figure 12, and Figure 13.
Code
"""The files related to this figure are in ./000_Attachments/fig-clusters_GG.This figure was generated using snakemake in the input directory.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import subprocessimport osfrom pathlib import Path# Current directorycurrent_dir = Path().resolve()# Change directoryinput_dir = Path("../../input").resolve()# Run the snakemake commandsubprocess.run( ["snakemake","-s","plumed.smk","../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-clusters-GG/clusters_configs.png","-j", ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,)# Change directory backos.chdir(current_dir)# Interactive 3D scatter plotimport plotly.io as pioimport json# Load the figure from the JSON filewithopen("./000_Attachments/fig-clusters_GG/clusters.json", 'r') as f: fig_data = json.load(f)# Create a figure from the loaded datafig = pio.from_json(json.dumps(fig_data))# Display the figurefig.show()
Figure 11: Interactive and static 3D scatter plots of the collective variables showing the HDBSCAN clusters for dimer 1. Representative configurations for each cluster are shown around the static scatter plot except for the cyan and brown clusters which include configurations where the lignin dimer interacts (weakly) with the outside of the cyclodextrin ring or where there is no direct contact between the lignin dimer and cyclodextrin at all. The arrows in the configurations point in the directions of \overrightarrow{CS} and \overrightarrow{HT}. HDBSCAN outlier points are not plotted.
Code
"""The files related to this figure are in ./000_Attachments/fig-clusters_TGG.This figure was generated using snakemake in the input directory.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import subprocessimport osfrom pathlib import Path# Current directorycurrent_dir = Path().resolve()# Change directoryinput_dir = Path("../../input").resolve()# Run the snakemake commandsubprocess.run( ["snakemake","-s","plumed.smk","../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-clusters_TGG/clusters_configs.png","-j", ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,)# Change directory backos.chdir(current_dir)# Interactive 3D scatter plotimport plotly.io as pioimport json# Load the figure from the JSON filewithopen("./000_Attachments/fig-clusters_TGG/clusters.json", 'r') as f: fig_data = json.load(f)# Create a figure from the loaded datafig = pio.from_json(json.dumps(fig_data))# Display the figurefig.show()
Figure 12: Interactive and static 3D scatter plots of the collective variables showing the HDBSCAN clusters for dimer 2. Representative configurations for each cluster are shown around the static scatter plot except for the orange cluster which includes configurations where the lignin dimer interacts (weakly) with the outside of the cyclodextrin ring or where there is no direct contact between the lignin dimer and cyclodextrin at all. The arrows in the configurations point in the directions of \overrightarrow{CS} and \overrightarrow{HT}. HDBSCAN outlier points are not plotted.
Code
"""The files related to this figure are in ./000_Attachments/fig-clusters_GG_BB.This figure was generated using snakemake in the input directory.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import subprocessimport osfrom pathlib import Path# Current directorycurrent_dir = Path().resolve()# Change directoryinput_dir = Path("../../input").resolve()# Run the snakemake commandsubprocess.run( ["snakemake","-s","plumed.smk","../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-clusters-GG_BB/clusters_configs.png","-j", ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,)# Change directory backos.chdir(current_dir)# Interactive 3D scatter plotimport plotly.io as pioimport json# Load the figure from the JSON filewithopen("./000_Attachments/fig-clusters_GG_BB/clusters.json", 'r') as f: fig_data = json.load(f)# Create a figure from the loaded datafig = pio.from_json(json.dumps(fig_data))# Display the figurefig.show()
Figure 13: Interactive and static 3D scatter plots of the collective variables showing the HDBSCAN clusters for dimer 3. Representative configurations for each cluster are shown around the static scatter plot except for the purple cluster which includes configurations where the lignin dimer interacts (weakly) with the outside of the cyclodextrin ring or where there is no direct contact between the lignin dimer and cyclodextrin at all. The arrows in the configurations point in the directions of \overrightarrow{CS} and \overrightarrow{HT}. HDBSCAN outlier points are not plotted.
Proportion of configurations belonging to each cluster: Original min_samples values
The proportions of the configurations with a lignin dimer bound to the center of the cyclodextrin belonging to each cluster for each dimer were calculated; configurations where the lignin dimer interacts (weakly) with the outside of the cyclodextrin ring or where there is no direct contact between the lignin dimer and cyclodextrin at all were not considered in the total number of configurations. Clusters with similar binding configurations were grouped together and proportions were computed for those groups as well. The proportions are shown in Table 1, Table 2, and Table 3.
Code
"""The data for this table is in ./000_Attachments/tbl-fractions_GG/fractions.csv.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""from IPython.display import display, HTMLimport pandas as pd# Read CSV file into a DataFramedf = pd.read_csv("./000_Attachments/tbl-fractions_GG/fractions.csv")# Convert DataFrame to HTML without the indexhtml_table = df.to_html(index=False)# Display the HTML table in the notebookdisplay(HTML(html_table))
Table 1: Proportions of center-bound configurations belonging to each cluster for dimer 1. Refer to Figure 11 to see the clusters and configurations corresponding to the label numbers. The head:sec, center:sec, and tail:sec labels refer to cluster groups which may include configurations from multiple clusters which are listed in the rows above them up until another group is encountered. The head:sec group refers to configurations where the lignin dimer head is closer to the center of the cyclodextrin than the lignin dimer tail or center and the center of the lignin dimer is closer to the cyclodextrin secondary face than the cyclodextrin primary face. The center:sec group refers to configurations where the lignin dimer center is closer to the center of the cyclodextrin than the lignin dimer head or tail and the center of the lignin dimer is closer to the cyclodextrin secondary face than the cyclodextrin primary face. The tail:sec group refers to configurations where the lignin dimer tail is closer to the center of the cyclodextrin than the lignin dimer head or center and the center of the lignin dimer is closer to the cyclodextrin secondary face than the cyclodextrin primary face.
Label
Fraction
4
0.0020
5
0.0060
6
0.5051
7
0.0054
8
0.0067
head:sec
0.5253
2
0.0571
center:sec
0.0571
3
0.4176
tail:sec
0.4176
There are 3 cluster groups defined for dimer 1: head:sec, center:sec, and tail:sec (see the caption of Table 1 for the definitions). The head:sec group consists of 5 clusters. However, most of the configurations belong to cluster 6. Clusters 4, 5, 7 and 8 account for a total of about 2% of all configurations, while cluster 6 has about 50% of all configurations. The center:sec and tail:sec groups each consist of a single cluster. It may be best to increase min_samples so that the configurations in clusters 4-8 are merged into one cluster.
Code
"""The data for this table is in ./000_Attachments/tbl-fractions_TGG/fractions.csv.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""from IPython.display import display, HTMLimport pandas as pd# Read CSV file into a DataFramedf = pd.read_csv("./000_Attachments/tbl-fractions_TGG/fractions.csv")# Convert DataFrame to HTML without the indexhtml_table = df.to_html(index=False)# Display the HTML table in the notebookdisplay(HTML(html_table))
Table 2: Proportions of center-bound configurations belonging to each cluster for dimer 2. Refer to Figure 12 to see the clusters and configurations corresponding to the label numbers. The head:sec, center:sec, and tail:pri labels refer to cluster groups which may include configurations from multiple clusters which are listed in the rows above them up until another group is encountered. The head:sec group refers to configurations where the lignin dimer head is closer to the center of the cyclodextrin than the lignin dimer tail or center and the center of the lignin dimer is closer to the cyclodextrin secondary face than the cyclodextrin primary face. The center:sec group refers to configurations where the lignin dimer center is closer to the center of the cyclodextrin than the lignin dimer head or tail and the center of the lignin dimer is closer to the cyclodextrin secondary face than the cyclodextrin primary face. The tail:pri group refers to configurations where the lignin dimer tail is closer to the center of the cyclodextrin than the lignin dimer head or center and the center of the lignin dimer is closer to the cyclodextrin primary face than the cyclodextrin secondary face.
Label
Fraction
4
0.0229
5
0.7153
head:sec
0.7382
2
0.2300
3
0.0280
center:sec
0.2580
1
0.0037
tail:pri
0.0037
There are 3 cluster groups defined for dimer 2: head:sec, center:sec, and tail:pri (see the caption of Table 2 for the definitions). The head:sec group consists of 2 clusters. However, most of the configurations belong to cluster 5. Cluster 4 accounts for about 2% of all configurations, while cluster 5 has about 72% of all configurations. The center:sec group consists of 2 clusters. However, most of the configurations belong to cluster 2. Cluster 3 accounts for about 3% of all configurations, while cluster 2 has about 23% of all configurations. The tail:pri group consists of a single cluster. It may be best to increase min_samples so that the configurations in clusters 4 and 5 and clusters 2 and 3 are merged into two clusters.
Code
"""The data for this table is in ./000_Attachments/tbl-fractions_GG_BB/fractions.csv.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""from IPython.display import display, HTMLimport pandas as pd# Read CSV file into a DataFramedf = pd.read_csv("./000_Attachments/tbl-fractions_GG_BB/fractions.csv")# Convert DataFrame to HTML without the indexhtml_table = df.to_html(index=False)# Display the HTML table in the notebookdisplay(HTML(html_table))
Table 3: Proportions of center-bound configurations belonging to each cluster for dimer 3. Refer to Figure 13 to see the clusters and configurations corresponding to the label numbers. The end:sec and center labels refer to cluster groups which may include configurations from multiple clusters which are listed in the rows above them up until another cluster group label is encountered. The end:sec group refers to configurations where one of the lignin dimer ends is closer to the center of the cyclodextrin than the lignin dimer center and the center of the lignin dimer is closer to the cyclodextrin secondary face than the cyclodextrin primary face. The center group refers to configurations where the lignin dimer center is near the cyclodextrin center.
Label
Fraction
1
0.0717
3
0.0027
end:sec
0.0744
2
0.9256
center
0.9256
There are 2 cluster groups defined for dimer 2: head:sec, center:sec, and tail:pri (see the caption of Table 3 for the definitions). The head:sec group consists of 2 clusters. However, most of the configurations belong to cluster 5. Cluster 4 accounts for about 2% of all configurations, while cluster 5 has about 72% of all configurations. The center:sec group consists of 2 clusters. However, most of the configurations belong to cluster 2. Cluster 3 accounts for about 3% of all configurations, while cluster 2 has about 23% of all configurations. The tail:pri group consists of a single cluster. It may be best to increase min_samples so that the configurations in clusters 4 and 5 and clusters 2 and 3 are merged into two clusters.
Proportion of configurations belonging to each cluster: Merged clusters
As discussed in Proportion of configurations belonging to each cluster: Original min_samples values, each cluster group typically includes one cluster that contains the majority of configurations within that group. Consequently, for groups containing multiple clusters, these clusters were merged by increasing the min_samples parameter. This also allows for direct comparison with the cluster fractions reported in the previous study1.
Conclusions
sdf
Future work
fwe
References
(1)
Dean, K. R.; Novak, B.; Moradipour, M.; Tong, X.; Moldovan, D.; Knutson, B. L.; Rankin, S. E.; Lynn, B. C. Complexation of Lignin Dimers with β-Cyclodextrin and Binding Stability Analysis by ESI-MS, Isothermal Titration Calorimetry, and Molecular Dynamics Simulations. J. Phys. Chem. B2022, 126 (8), 1655–1667. https://doi.org/10.1021/acs.jpcb.1c09190.
Tribello, G. A.; Bonomi, M.; Branduardi, D.; Camilloni, C.; Bussi, G. PLUMED 2: New Feathers for an Old Bird. Comput. Phys. Commun.2014, 185 (2), 604–613. https://doi.org/10.1016/j.cpc.2013.09.018.